Evolutionary genomics of socially polymorphic populations of Pogonomyrmex californicus

Background Social insects vary considerably in their social organization both between and within species. In the California harvester ant, Pogonomyrmex californicus (Buckley 1867), colonies are commonly founded and headed by a single queen (haplometrosis, primary monogyny). However, in some populations in California (USA), unrelated queens cooperate not only during founding (pleometrosis) but also throughout the life of the colony (primary polygyny). The genetic architecture and evolutionary dynamics of this complex social niche polymorphism (haplometrosis vs pleometrosis) have remained unknown. Results We provide a first analysis of its genomic basis and evolutionary history using population genomics comparing individuals from a haplometrotic population to those from a pleometrotic population. We discovered a recently evolved (< 200 k years), 8-Mb non-recombining region segregating with the observed social niche polymorphism. This region shares several characteristics with supergenes underlying social polymorphisms in other socially polymorphic ant species. However, we also find remarkable differences from previously described social supergenes. Particularly, four additional genomic regions not in linkage with the supergene show signatures of a selective sweep in the pleometrotic population. Within these regions, we find for example genes crucial for epigenetic regulation via histone modification (chameau) and DNA methylation (Dnmt1). Conclusions Altogether, our results suggest that social morph in this species is a polygenic trait involving a potential young supergene. Further studies targeting haplo- and pleometrotic individuals from a single population are however required to conclusively resolve whether these genetic differences underlie the alternative social phenotypes or have emerged through genetic drift. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-024-01907-z.


Figures S1 to S17
Tables S1 to S5 Extended Methods
Queens were snap-frozen in liquid nitrogen and transferred to vials containing 0.5 ml RNAlater (Invitrogen).There are no legal regulations concerning the collection or treatment of ants in the USA or Germany.All ants were barcoded using Cytochrome Oxidase I (COI) to confirm species identification and determine mitochondrial variation in both populations.

DNA extraction, library prep and genome sequencing
DNA was extracted using phenol-chloroform following [36].Libraries were constructed using NEBNext ® Ultra ™ II FS DNA Library Prep Kit for Illumina (NEB #E7805; New England BioLabs, Ipswich, USA) according to manufacturer's protocol.DNA extracts and libraries were controlled for purity and size using Nanodrop and Bioanalyzer 2100 (Agilent, Santa Clara, USA).Libraries were paired-end sequenced (2x 75 bp) at the Genomic Core Facility of the University of Münster (Germany), using a NextSeq 500 system with NextSeq 500 High Output Kit v2.5 (Illumina, San Diego, USA).We sequenced 16 queens from the H-population and 19 from the P-population at an average coverage of 12x (Additional file 2: Table S5).

Genome assembly and annotation
We sequenced and assembled the genome of P. californicus ("Pcal.v2")using data from MinION long reads (MinION, Oxford Nanopore Technologies, Oxford, UK), Illumina short reads, and previously published 10X Chromium linked reads [34].For long read library preparation, we sampled three white worker pupae from a pleometrotic colony ("Pcal18-02") in liquid nitrogen for high-molecular weight DNA extraction using a phase-separation protocol, optimized for DNA extraction from ants.After quality control with agarose gels, Nanodrop and Qubit measurements, we used 2 µg of DNA for library preparation with the Ligation Sequencing Kit (Oxford Nanopore) SQK-LSK109.The resulting library was sequenced for 48 h on a MinION sequencer, using a single flow cell.After basecalling with guppy (v.4.0.14) we retrieved 1,650,688 reads (mean read length 4.9 kb) with a mean read quality of 13.4.Read length N50 was 18.5 kb and a total of 8.1 Gb of data was generated.We used PoreChop (v.0.2.4) to remove adapter contaminations from MinION reads before further filtering the data with FiltLong (v.0.2.0), using paired-end Illumina data from three samples (P50Y, P67P, P55R).We assembled the genome from ~30x filtered Nanopore data using the following strategy.We created one assembly using CANU (v.2.0) [47] and a second assembly using wtdbg2 (v.2.5).Both assemblies were processed and optimized as follows: After contig assembly, we used SSPACE (v.3.0)[58] for scaffolding, LR_closer for gap filling, and ntedit for polishing with long-read data.We then used short-read, short-insert Illumina data from three samples (P50Y, P67P, P55R) for scaffolding with SSPACE before three rounds of polishing with Pilon [116].The polished assembly was super-scaffolded with publicly available 10X Chromium data for P. californicus (accession SRX8051306 [34]) using scaff10x and subsequent correction with break10x.We used an in-house, blastn-based pipeline to remove bacterial contaminations from each assembly.Finally, we combined the CANU-based and the wtdbg-based assemblies with ntjoin, using the CANU-based assembly as reference.The joint assembly was filtered for redundancy using funannotate clean (v.1.7.1) (https://github.com/nextgenusfs/funannotate/tree/1.7.1), removing four scaffolds.

Genome annotation
We annotated repeats and transposable elements (TEs) in the genome using a combination of de- After marking duplicates using Picard's MarkDuplicates (v.2.20.0), we performed joint-variant calling across all samples, in two rounds.
We first generated a high quality set of SNPs that we used for base quality recalibration in the second pass.In each pass, we used GATK's HaplotypeCaller (v.4.1.2.0) with default parameters [37].Using this approach, we recovered a total of 1,486,819 SNPs that we hard-filtered using the GATK VariantFiltration tool with the following parameters: the variant confidence normalized by depth (QD < 2.0), estimates of the variant strand bias test (FS > 60.0 and SOR > 3.0), mapping quality (MQRankSum < -5.0 and MQ < 40.0) and the rank sum test for the positioning of variants within (readsReadPosRankSum < -5.0).We based our choice of these parameters on a visual

Population structure analyses
We used PLINK to LD-prune the 613,411 SNPs at a threshold of r 2 < 0.2, using the --indep-pairwise 1 kb 1 0.2 command.This yielded 314,756 LD-pruned SNPs that we then used for Principal Component Analyses (PCA) using the --pca command of PLINK (v.1.90p) [41,42], considering four eigenvectors.The LD-pruned SNPs were further used for ADMIXTURE (v.1.3.0)analyses [43] with 2 to 4 clusters.The best fit was selected based on cross-validation error rates.
To better characterize the relationships among queens in our sample, we used the haplotype-based approach implemented in fineSTRUCTURE [44].In this method, each individual is considered in turn as a recipient, whose chromosomes are reconstructed as a combination of haplotypes from all other individuals (the donors), using ChromoPainter [44].Closely related individuals are expected to share more haplotype blocks with each other.The final output is a coancestry matrix that can be visualized as a heat map summarizing the number of shared haplotypes between all pairs of individuals in the analyzed data sets.
We first converted SHAPEIT output to ChromoPainter's PHASE format using the impute2chromopainter.pl provided by the developers of fineSTRUCTURE.We ran fineSTRUCTURE (v.4.1.0)under the 'linked' mode with default parameters.
Effective population size (Ne) and the relative cross-coalescence rate (rCCR) were then calculated using MSMC2 following the developer's guidelines [48,49].We ran 1,000 bootstrap replicates, each time randomly sampling 60 Mb for each of the analyzed genomes.To obtain the time in years, we assumed a generation time of 4 years and a similar mutation rate as in honey bees [50], i.e. 3.6 * 10 -9 mutations * nucleotide -1 * generation -1 .
VCFtools' --het function was employed to calculate heterozygosity and inbreeding coefficient on a per individual basis.To reduce the effects of mapping errors, windows where the average mapping quality is below 40, and/or show extreme depth of coverage and/or SNP number were excluded from further analyses (Additional file 2: Fig. S17).
Linkage disequilibrium (LD) decay was estimated by calculating pairwise r 2 values between all markers within 200 kb windows using VCFtools' --hap-r2 function in each of the two populations.
We used SNP markers with a minor allele frequency (MAF) of 0.2 to minimize the effects of rare variants.We next averaged r 2 values between all pairs of markers in 100 bp distance bins for ease of visualization.
We tested for statistical differences between populations by performing a Wilcoxon rank sum test with "Bonferroni" adjustments of the p-values in R (v.4.0.2) [46].

Genome scan for selection
To identify candidate genes potentially responsible for the social polymorphism in P. californicus, we screened genomes of both populations for evidence of selection using an FST outlier approach and by identifying regions of extended haplotype homozygosity (selective sweeps), comparing the haplometrotic and pleometrotic populations [51][52][53][54].
Prior to computing FST values between the two populations, we estimated relatedness between queens using PLINK's --make-rel function [41,42].Based on visual inspection of the distribution of relatedness values, we excluded queens from pairs with relatedness > 0.4 leaving us with 17 (of 19) and twelve (of 16) queens from the pleometrotic and haplometrotic populations, respectively.
Moreover, we excluded one queen from the pleometrotic population as our population structure analysis indicated that it was a F1 hybrid (P99P).
We excluded an ambiguous region on scaffold 11 (position: 1,200,000-1,349,402) that showed opposing pattern of selection (i.e.FST-based approach suggested selection in the pleometrotic population while the xpEHH indicated selection in the haplometrotic population).

Characterization of candidate genes and SNPs
We used BEDtools' intersect function (v.2.29.2) [61] to extract regions identified simultaneously by the FST-and xpEHH-based approach.Next, we used SnpEff (v.5.0) [62] to annotate putative effects on protein-coding genes of all 3,347 SNPs contained in the candidate regions (Table 1).
The FST-and xpEHH-based approaches to detect selection yielded a set of 145 candidate genes (including 97 in the supergene region).To identify putative function of these candidate genes, we performed a BLAST (blastx) [63] search against the nr database (January 2020).In addition to BLAST, we used OrthoFinder (v.2.3.1)[64] to identify Drosophila melanogaster orthologs to our candidate genes.Further, we used RepeatProteinMask to identify genes encoding TE proteins.For the manual review of the 48 genes contained in genomic regions showing evidence for a selective sweep outside the supergene, we removed TE-encoded genes as well as genes without RNAseq support and no known homology to any proteins in the nr database, resulting in 34 putatively functional genes (Additional file 6).Of these, 18 genes showed non-synonymous variants in our dataset.In six of these 18 genes at least one non-synonymous variant showed a pattern where over 90% of the individuals from the H-population carried the alternative allele (i.e.either 0/1 or 1/1) and over 90% of the individuals from the P-population carried the reference allele (i.e.either 0/1 or 0/0) (Fig. 4a).

Haplotype frequencies
We calculated haplotype frequencies for the putative supergene region and tested for potential departure from Hardy-Weinberg equilibrium using a  !test (Additional file 2: Table S3).Haplotypes were assigned manually, based on visual inspection of the suggested supergene region (Additional file 2: Fig. S8); predominantly homozygous for the reference alleles (i.e. from a pleometrotic population) and alternate allele were assigned "Sp/Sp" and "Sh/Sh", respectively, and predominantly heterozygous regions were assigned "Sh/Sp".

RAD sequencing and analysis
Winged males from a monogynous colony at Lake Henshaw (H-population) were collected during the mating season 2019 and immediately stored in 100% ethanol.DNA was isolated using a phenol-chloroform method [36].Tests using six microsatellite markers confirmed that all males belonged to the same matriline (results not shown).Through visual inspection of the genotypes of haplometrotic queens (Additional file 2: Fig. S6), we anticipated that the males were progeny of a queen heterozygous at the nonrecombining region.Following Inbar et al.
[65], we constructed a reduced representation genomic library using an adjusted double-digest Restriction-site Associated DNA sequencing (ddRADseq) protocol [66,67].First, we digested DNA withEcoR1 as rare-cutter and Mse1 as frequent-cutter restriction enzyme.Then, we ligated DNA fragments to barcoded adaptors for multiplexing (using CutSmart buffer NEB).The adaptor-ligated products were PCR amplified, pooled, and size-selected using Beckman Coulter, Agencourt AMPure XP beads.The libraries were sequenced in one lane of a paired-end 150 bp reads on a HiSeq X Ten Illumina sequencer.
We used Trimmomatic v.0.39 [88] to remove adaptor sequences with the following settings: ).A skeleton map was generated using delegate markers for all markers that showed 100% linkage.Kosambi's mapping function was used to convert recombination frequencies to centimorgan [72].The complete map will be published elsewhere, for this manuscript we are only reporting the results for LG 14 that harbors a major non-recombining area (supergene).

Characterization of the supergene
To investigate the potential supergene identified in the current study, we used SNP markers from queens (whole genome resequencing) and males (RADseq).We used BCFtools' (v.1.9)isec and merge functions to extract shared SNPs between the two datasets.
182 out of the resulting 1,064 SNP markers were used to visualize the genotypes of the males and queens at LG14 using VIVA (v.0.4.0) [73].To further characterize the two haplotypes of the supergene, we performed a PCA on 69 SNP markers identified in the non-recombining area of LG14 (i.e. scaffolds 22, 23, 38, 42, 48, and parts of 6 and 14) in datasets of males and queens (Fig. 3c; Additional file 2: Fig. S11).
We next used LDheatmap (v.1.0-4)[75] to generate heatmap plots to visualize pairwise linkage disequilibrium (LD).The lack of recombination between Sh and Sp was verified by calculating pairwise linkage disequilibrium in a pool of twelve queens (six homozygous for Sh and six homozygous for Sp).Further, we used another six heterozygous (Sh/Sp) queens (H118W, H123W, H144W, H46G, H89W and P99P) to confirm the lack recombination between Sh and Sp (Additional file 2: Fig. S12).
To investigate the genomic architecture of the supergene, we used BEDOPS' bedmap program [76] to estimate TE as well as gene content across LG14 in 100 kb sliding windows based on annotations produced as described above.
To explore whether any of the transcripts previously shown to be associated with aggression of queens depending on their social context [77] belong to the supergene, we aligned the assembled transcripts to the Pcal.v2reference genome using GMAP [78].We only retained the top hits with at least 70% query coverage, yielding 7303 transcripts (out of 7890) unambiguously aligned to the genome.We then used BEDtools' intersect function (v.2.29.2) [61] to extract transcripts that mapped to the supergene region (Additional file 3).
To test whether genes found in the putative supergene are enriched for specific functions, we ran a GO enrichment analysis.We defined these regions.We used BEDTools' intersect (v.2.30.0) [61] to extract 682 genes located in the nonrecombining region and ran topGO [81] in R.

Dating the supergene formation
We approximately estimated the divergence between Sh and Sp by computing the rCCR, using MSMC2 [49] on four single individual queens with the Sh (H104W, H38W, H73W and H98W) and the Sp alleles (P114W, P50R, P84P and P97R).We limited the analysis to the supergene region (scaffolds 22, 23 and part of scaffold 6; larger than 1 Mb [82]) and ran 500 bootstraps by randomly sampling 2 Mb each time.Further, we performed a phylogenetic analysis of Sh and Sp alleles with P. subnitidus as an outgroup species using RAxML [83].Sequencing data of one P. subnitidus individual was generated and processed as described above.Joint SNP calling (with the other 35 queens sequenced in this study) was performed using GATK [37].Next, 33,424 SNPs (only biallelic, polymorphic and missingness < 0.2) at the supergene region (scaffolds 22, 23, 38, 42, 48 and parts of scaffolds 6 and 14) were extracted and an alignment file was generated based on these SNPs for six queens homozygous for the Sh allele (H104W, H33G, H34G, H38W, H73W and H98W), six others homozygous for the Sp allele (P22P, P102R, P114W, P109R, P34R, P84P) and one P. subnitidus sample.The alignment file was then used to construct a maximum-likelihood tree using RAxML [83] under the GTRGAMMA model with bootstrapping.
We then used a fossil-calibrated dated phylogeny [84] to date the split between the Sh and Sp haplotype groups using a phylogenetic approach.The Ward et al. [84] phylogeny dated the split between P. vermiculatus and P. imberbiculus at 18 (95% highest probability density (HPD) ±8) MYA.This is an upper bound on the speciation date of P. californicus and P. subnitidus according to the most recent Pogonomyrmex phylogeny [85].By comparing the P. subnitidus branch length (1.273) and the average path length from the split between Sh-Sb to the Sh leaves (0.0325) (haplometrotic queen sequences), we calculated a ratio of 1:39 between the speciation time of P.
subnitidus and P. californicus, and the Sh-Sb haplotypes divergence time.Applying this ratio to the dating reference (i.e.18±8 MYA), we obtained 0.46 (95% HPD ±0.20) MYA as an upper bound for the divergence between the two haplotype groups, which is consistent with the MSMC2 analysis.We note that the confidence in the age inference is strengthened by the agreement between two distinct approaches: the inference by MSMC2 is based on a coalescent model, which is a distinct approach from the phylogenetic dating approach.The former approach depends on estimates of mutation rate and generation time, whereas the latter depends on fossil calibration points.The tree figure was produced using interactive Tree Of Life (iTOL) [86].1.00 1 2 3 G P 9 9 P P 1 1 4 W P 3 4 R P 1 3 4 P P 1 0 9 R P 1 1 7 P P 5 0 R P 1 2 9 P P 2 5 R P 2 2 P P 2 9 P P 3 0 B P 1 0 2 R P 5 0 Y P 9 7 R P 5 B P 8 4 P P 5 5 R P 6 7 P ancestry CV error (K=4): 0.63

Fig. S5
Linkage disequilibrium (LD) decay displayed for each of the 25 largest scaffolds in the two P. californicus populations.LD as measured by pairwise r 2 is represented on the y-axis and distance between pairs of markers within 10 kb on the x-axis.Lines and shaded areas are means and 95% confidence intervals, respectively.

Fig. S6
Heatmap showing the genotypes (scaffolds 22 and 23) of 35 queens originating from pleometrotic and haplometrotic populations.Each row corresponds to a queen, while columns represent a subset of 2,729 SNPs (thinned out of 6,098 SNPs for ease of visualization) with a minor allele count of at least 3 on both scaffolds.californicus.The heatmap displays the genotypes of the 35 queen samples from the pleometrotic and haplometrotic populations.Rows represent a total of 5,363 SNPs (thinned out of 68,643 SNPs for ease of visualization) with a MAF ≥ 0.2, spanning seven scaffolds of LG14.Columns (blue for haplometrotic and green for pleometrotic) represent individual queens grouped by population (haplometrotic and pleometrotic).Black rectangle delimits the non-recombining supergene region, spanning scaffolds 22,23,38,42,48 and parts of scaffolds 6 and 14.Queen P99P is an F1 hybrid collected from a pleometrotic population.

Fig. S14
Signatures of site-specific extended haplotype homozygosity (EHHS) at two outlier SNPs in the pleometrotic (green), but not the haplometrotic population (blue) indicating recent selective sweeps.The position of each SNP is indicated at the top of each plot.Our six best candidate genes are found in these regions of elevated extended haplotype homozygosity scores.

Fig. S15
Genotypes of haplo-and pleometrotic queens for candidate genes under selection (a-f).
Each plot shows SNP positions (top, solid lines), the affected features of the candidate gene (gene models in purple), and the genotypes for each queen in the two populations.Pleometrotic queens are almost all homozygous and fixed for the reference allele, unlike the haplometrotic queens that are more variable.Plots were produced using GenotypePlot [https://github.com/JimWhiting91/genotype_plot].Note that the bottom row in the pleometrotic population displays the genotype of the hybrid P99P queen.

Fig
Fig. S13).To determine in which population the differentiated windows are under selection, we computed nucleotide diversity π and Tajima's D in 100 kb non-overlapping windows across the genome (Additional file 2: Fig.S4) and reasoned that these regions should show a reduced Tajima's D (typically negative values) and nucleotide diversity in the population where they are subject of selection.Genomic windows failing these conditions were flagged as ambiguous and were discarded from further processing, resulting in a total of 51 candidate regions evolving under divergent selection between the populations (Additional file 4).FST estimates can be affected due to variation in recombination rate across the genome[55].Hence, we further applied the cross-population extended haplotype homozygosity (xpEHH) and the site-specific extended haplotype homozygosity (EHHS) tests[56, 57], to detect signatures of selective sweeps.The xpEHH compares the length of haplotypes between populations to detect selective sweeps in which the selected allele has approached or reached fixation in at least one population[56].The EHHS on the other hand is defined as the decay of identity of haplotypes surrounding a core marker of a population as a function of distance[57].The xpEHH analyses were performed on the phased data from unrelated queens using the R package rehh (v.3.1.2)[59]with the following settings: polarize_vcf=FALSE, min_maf=0.05,

Fig. S1
Fig. S1Principal Component Analysis based on 314,756 genome-wide SNPs of queens of the pleometrotic and haplometrotic populations.The scatterplot matrix shows the four eigenvectors (i.e.principal components).Note that only the haplometrotic queens vary along PCs 2, 3 and 4.

Fig. S2
Fig. S2Admixture patterns among P. californicus populations using ADMIXTURE at K = 3 and K = 4.For each value of k, the cross-validation error rate is presented.

Fig. S4
Fig. S4 Genome-wide genetic differentiation (FST), nucleotide diversity (π), Tajima's D and xpEHH scores in the 25 largest scaffolds of the P. californicus genome.a Genetic differentiation (FST) between haplometrotic and pleometrotic queens.Blue dashed line indicates the 5% cutoff.b Nucleotide diversity () and c Tajima's D in the haplometrotic (blue) and pleometrotic (green) populations.All estimates were calculated in 100 kb non-overlapping windows.Lines and shaded areas are means and 95% confidence intervals, respectively.d log-scaled P-values and e xpEHH scores associated with SNPs across the 25 scaffolds.Black dashed line represents threshold significance for outlier SNPs (p < 0.05).Red dots highlight windows (FST) and SNPs (xpEHH) showing signatures of selection in both analyses.Scaffolds 22 and 23 containing the putative supergene are highlighted in orange.

Fig. S7
Fig. S7Linkage group (LG) 14(253.38 cM).Marker names are given on the left and genetic distance in cM (Kosambi) are given on the right sides.Both figures represent the same LG group.a "skeleton linkage group" where overlapping markers at the same position are grouped and represented only by a single marker name (i.e. the non-recombining region is represented by a single marker; thick red arrowhead).b linkage group includes all 339 markers and shows the nonrecombining area.

Fig. S8
Fig. S8Genomic architecture at linkage group (LG) 14 harboring the putative supergene in P. californicus.The heatmap displays the genotypes of the 35 queen samples from the pleometrotic and haplometrotic populations.Rows represent a total of 5,363 SNPs (thinned out of 68,643 SNPs for ease of visualization) with a MAF ≥ 0.2, spanning seven scaffolds of LG14.Columns (blue for haplometrotic and green for pleometrotic) represent individual queens grouped by population (haplometrotic and pleometrotic).Black rectangle delimits the non-recombining supergene region, spanning scaffolds22, 23, 38, 42, 48  and parts of scaffolds 6 and 14.Queen P99P is an F1 hybrid collected from a pleometrotic population.

Fig. S10
Fig. S10 Alignment of the putative P. californicus supergene on LG 14 to the genome of Solenopsis invicta.The figure displays alignments with a minimum length of 500 bp, with grey indicating forward orientation and red indicating reverse orientation.The 16 chromosomes of S. invicta, along with unplaced scaffolds (on the left; start with JAEQMK), are represented by different rectangles.Chromosome CM028747.1,highlighted in blue, corresponds to the social chromosome containing the fire ant supergene.

Fig. S11
Fig. S11Principal Component Analysis based on 69 markers spanning the supergene region showing that the 108 male haplotypes are either Sh (negative PC1 scores) or Sp (positive PC1 scores).Diploid queens are either homozygous for the Sp allele (tightly clustered green points; most of pleometrotic queens) or heterozygous Sh/Sp (widely distributed blue points; most of the haplometrotic queens).

Fig. S12
Fig. S12 Linkage disequilibrium estimates (r 2 ) across linkage group (LG) 14 containing the putative supergene for six heterozygous queens (Sh/Sp).The red rectangle shows the region of no recombination between Sh and Sp.SNPs are ordered according to physical position on the chromosome after lift over.Black lines on top of the heatmap indicate SNP positions on the physical map.

Fig. S13
Fig. S13 Distribution of pairwise genetic differentiation (FST) between haplometrotic and pleometrotic queens, estimated in 100 kb non-overlapping windows across the genome.Regions falling into the upper 5% tail of the distribution (FST > 0.423, delimited by the blue dashed line) were considered candidates for divergent selection between populations.

Fig. S16
Fig. S16Mapping quality across the 199 scaffolds of the Pcal.v2assembly.Dots represent average mapping quality in 100 kb window, calculated and averaged across the 35 queens sequenced in this study.Scaffolds 26-199 were excluded from all analyses due to fragmentation and low mapping quality.

Fig. S17
Fig. S17 Quality control of non-overlapping genomic windows (100 kb) used to compute population genomic metrics.Number of variants, mapping quality and mean coverage per 100 kb window before (a, b, and c, respectively) and after (d, e, and f, respectively) applying cutoffs (see Methods).Red bars indicate windows excluded from population genomic analyses.

Table S1
Genomic coordinates and sizes of regions belonging to the putative supergene on LG14 including whole scaffolds22, 23, 38, 42, 48, and parts of scaffolds 6 and 14.

Table S2
Results of Gene Ontology enrichment analyses using topGO for the 682 genes annotated within the putative supergene.MF: Molecular Function, FDR: FDR-adjusted p-value.

Table S3
Results for Hardy-Weinberg calculations at the supergene locus for the pleometrotic and haplometrotic populations.IDs of queen assigned to each genotype (Sh/Sh, Sp/Sp or Sh/Sp) are indicated for each population.Genotypes were assigned based on visual inspection of the supergene region (see Methods).

Table S4
Genomic coordinates of regions showing patterns of positive selective sweeps in the pleometrotic population, identified by our two selection scan approaches (FST and xpEHH).

Table S5
Quality control metrics for the sequencing data used in this study.